Segmentation of Automatically Transcribed Broadcast News Text
نویسندگان
چکیده
Expertise in the automatic transcription of broadcast speech has progressed to the point of being able to use the resulting transcripts for information retrieval purposes. In this paper, we describe the Segmentation system used by Dragon Systems in the Segmentation task of the 1998 TDT evaluation, highlighting improvements made since the September 1998 dryrun. Segmentation of closed-caption and human transcripts of news is contrasted with the results of segmenting the ASR transcripts. This will be followed by a discussion of the metric, in particular how the value of the segmentation metric relates to the value of the tracking metric, when the these latter two tasks are performed on automatically segmented ASR text, rather than the ASR text with correctly marked boundaries.
منابع مشابه
Segmentation and Indexation of Broadcast News
This paper describes a topic segmentation and indexation system for broadcast news that is integrated in an alert system for selective dissemination of multimedia information. The goal of this work is to enhance the retrieval and navigation through specific spoken audio segments that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on simple heu...
متن کاملIndexing Broadcast News
This paper describes a topic segmentation and indexation system for broadcast news that is integrated in an alert system for selective dissemination of multimedia information. The goal of this work is to enhance the retrieval and navigation through specific spoken audio segments (stories) that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on ...
متن کاملTopic Indexing of TV Broadcast News Programs
This paper describes a topic segmentation and indexation system for TV broadcast news programs spoken in European Portuguese. The system is integrated in an alert system for selective dissemination of multimedia information developed in the scope of an European Project. The goal of this work is to enhance the retrieval of specific spoken documents that have been automatically transcribed, using...
متن کاملA Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation
One important class of online videos is that of news broadcasts. Most news organisations provide near-immediate access to topical news broadcasts over the Internet, through RSS streams or podcasts. Until lately, technology has not made it possible for a user to automatically go to the smaller parts, within a longer broadcast, that might interest them. Recent advances in both speech recognition ...
متن کاملNetwork of Data Centres (NetDC): BNSC - An Arabic Broadcast News Speech Corpus
Broadcast news is a very rich source of Language Resources that has been exploited to develop and assess a large set of Human Language Technologies. Some examples include systems to: automatically produce text transcriptions of spoken data; identify the language of a text; translate a text from one language to another; identify topics in the news and retrieve all stories discussing a target top...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999